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ABSTRACT 

One purpose of this research is to develop models of 
cognitive processes in understanding mechanical systems. A particular 
focus was on the processes in mentally animating the representation 
of a mechanical system, and the contribution of animation graphics in 
comprehension. Several studies, involving eye fixations, verbal 
protocols, and process tracing, indicated that mental animation was 
difficult for individuals who were not knowledgeable about mechanics. 
Animation did help them determine the motion of individual 
components, but animation alone did not entirely compensate for the 
subject's difficulty in identifying relevant features and ignoring 
irrelevant features. These subjects included college students, 
professional mechanics, and high school graduates applying for 
positions as firemen or policemen in New York City. A second goal of 
the research was to analyze the differences among individuals who are 
performing analytic reasoning tasks. The cognitive processes in a 
widely used, nonverbal test of analytic intelligence, the Raven 
Progressive Matrices Test, were analyzed using experimental and 
modelling techniques. Two processes that were found to distinguish 
average and superior performance are the ability to induce abstract 
relations and the ability to dynamically manage a large set of 
problem solving goals in working memory. Ten figures illustrate the 
report and a list of nine publications associated with this research 
project is included. (Contains 21 references.) (Author/ALF) 
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INTRODUCTION 

This is the final report for ONR contract N00014-89-J-1218 between the Office of 
Naval Research (Manpower Committee) and Carnegie-Mellon University. Patricia A. Carpenter 
and Marcel Adam Just were the principle investigators. This contract covers woric from 
December 1, 1988 through November 30, 1991. 

The overall purpose of the research is to develop models of the cognitive thlnlclng that 
constitutes understanding mechanical systems. The comprehension of mechanical devices, 
whether in preparation for operating, assembling, or repairing them, involves constructing a 
representation of the mechanical and physical properties of the device, including the motions 
and actions of the component parts and their dynamic interrelations. A particular focus of 
this research was on the processes that occur in mentally animating the representation of a 
mechanical system, and additionally, the processes in understanding animation graphics 
systems that display mechanical motion. 

A second goal of this research is to analyze the differences among people who are 
good at various types of reasoning tasks and those who are not. Differences among 
individuals in their ability to reason is of obvious practical and scientific significance. An 
important facet of the completeness of a theory is to account not only for the effects of a 
task and situation, but also the systematic differential performance among individuals. 

The potential applications for the Navy are most obvious in two areas. The first area 
is personnel selection, specifically, better interpretation of the processes that are assessed 
by existing achievement and skill tests as well as the potential for b.. ;er design of future 
tests. The second area of potential application Is In training. Animation graphics opens the 
possibility of new instructional techniques both in training and job situations. Research on 
the comprehension of such animation graphics has so far not kept pace with the rapid 
technological advances, so that relatively little is known about the cognitive processes that 
may make this technology more or less useful in training and learning situations. 

The research approach is to develop fine-grained analyses of the reasoning and visual- 
perceptual processes in various types of problem solving tasks. The project utilizes data- 
intensive methodologies, such as eye fixations and verbal protocols, that allow us to monitor 
the cognitive processes on-line, as they occur, and to relate these measures to the eventual 
outcome, such as the correctness or type of response. Thus, these investigations seek to 
analyze the micro-structure of the problem solving processes, particularly in spatial and 
mechanical domains. 

The following sections briefly summarize the research associated with the project and 
cites references to published descriptions of the work. 
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/. Individual differences in working memory 

The concept of analytic intelligence is a pervasive one in personnel selection, in 
psychometric theory and in testing more generally. In spite of the wide-spread use of 
"intelligence" tests, there is very little research on the actual processing that such tests 
evoke. In one line of research, we have pursued an analysis of the processes that occur 
during various cognitive tasks, such as spatial ability (Carpenter & Just, 1986; Just & 
Carpenter, 1985), verbal reasoning (Just & Carpenter, 1992; Carpenter & Just, 1989), 
mechanical problem solving (Hegarty, Just & Morrison, 1988; Just & Carpenter, 1987; 
Hegarty, Carpenter & Just, 1991), and complex reasoning (Carpenter, Just & Shell, 1990). 
The approach in all of these projects has been to use a variety of methods to analyze the 
ongoing thought processes of both more and less successful problem solvers, including eye 
fixations and "think aloud" protocols and other process-tracing methodologies (Just & 
Carpenter, 1976, 1988, 1987). These empirical studies are coordinated with the construction 
of detailed models of those processes, models that are often implemented as computer 
simulations. The scientific goal has been to combine a variety of techniques to specify the 
cognitive processes that underlie basic cognitive skills. 

One series of studies has focused on characterizing reasoning, particularly focusing on 
the role of working memory. The initial research focused on a common psychometric test 
called the Raven Progressive Matrices Test (Raven, 1962). The Raven test, including the 
simpler Standard Progressive Matrices Test and the Coloured Progressive Matrices Test, is 
also widely used in both research and clinical settings. The test is used extensively by the 
military in several western countries (for example, see Belmont & Marolla, 1973). Also, 
because of Its non-verbal format, it is a common research tool used with children, the 
elderly, and patient populations for whom It is desirable to minimize the processing of 
language. The wide usage means that there is a great deal of information about the 
performance profiles of various populations. But more importantly, It means that a cognitive 
analysis of the processes and structures that underlie performance has potential practical 
implications in the domains in which the test is used either for research or classification. 

There are several reasons why the Raven test provides an appropriate test bed to 
analyze analytic intelligence. First, the size and stability of the Individual differences that the 
test elicits, even among college students, suggest that the underlying differences in cognitive 
processes are susceptible to cognitive analysis. Second, the relatively large number of items 
on the test (36 problems) permits an adequate data base for the theoretical and 
experimental analyses of the problem-solving behavior. Third, the visual format of the 
problems makes it possible to exploit the fine-grained, process-tracing methodology afforded 
by eye fixation studies (Just & Carpenter, 1976). Finally, the correlation between Raven test 
scores and measures of intellectual achievement suggests that the underlying processes may 
be general, rather than specific to this one test (Court & Raven, 1982), although like most 
correlations, this one must be interpreted with caution. 

Several different research approaches have converged on the conclusion that the 
Raven test measures processes that are central to analytic intelligence. Individual 
differences in the Raven correlate highly with those found In other complex, cognitive tests 
(see Jensen, 1987). The centrallty of the Raven among psychometric tests is graphically 
Illustrated in sever&l nonmetric scaling studies that examined the interrelations among ability 
test scores obtained both from archival sources and more recently collected data (Snow, 
Kyllonen & Marshaiek, 1984). The scaling solutions for the different data bases showed 
remarkably similar patterns. The Raven and other complex reasoning tests were at the 
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center of the solution. Simpler tests were located towards the periphery and they clustered 
according to their content, as shown in Figure 1. This particular scaling analysis is based 
on the results from a variety of cognitive tests given to 241 high school students (Marshaiek, 
Lehman & Snow, 1983). 



Insert Figure 1 ■ Marshaiek et al. Results 

Snow et al. also constructed an idealized space to summarize the results of their 
numerous scaling solutions, in which they placed the . Raven test at the center, as shown in 
Figure 2. In this idealized solution, task complexity is maximal near the center and 
decreases outward toward the periphery. The tests in the annulus surrounding the Raven 
test ^=nvolve abstract reasoning, induction of relations, and deduction. For tests of 
intermediate or low complexity only, there is a clustering as a function of the test content, 
with separate clusters for verbal, numerical and spatial tests. By contrast, the more 
complex tests of reasoning at the center of the space were highly intercorrelated in spite of 
differences in specific content. 



Insert Figure 2 - Idealized Results 

One of the sources of the Raven test's centrality, according to Marshaiek, Lehman and 
Snow was that "... more complex tasks may require more involvement of executive 
assembly and control processes that structure and analyze the problem, assemble a strategy 
of attack on it, monitor the performance process, and adapt these strategies as performance 
proceeds..." (1983, p. 124). This theoretical interpretation is based on the outcome of the 
scaling studies. Our research also converges on the importance of executive processes, but 
the conclusions are derived from a process analysis of the Raven test. 

A task analysis of the Raven Progressive Matrices Test suggests some of the cognitive 
processes that are likely to be implicated in solving the problems. The test consists of a 
set of visual analogy problems. Each problem consists of a 3 x 3 matrix, in which the 
bottom right entry is missing and must be selected from among eight response alternatives 
arranged below the matrix. Each entry typically contains one to five figural elements, such 
as geometric figures, lines, or background textures. The test instructions tell the test-taker 
to look across the rows and then look down the columns to determine the rules and then to 
use the rules to determine the missing entry. The problem in Figure 3 illustrates the 
format. 



Insert Figure 3 - Samp! ' Problem 

The variation among the entries in a row and column of this problem can be 
described by three rules: 

- Rule A, Each row contains three geometric figures (a diamond, a triangle and a 
square) distributed across its three entries. 

- Rule B, Each row contains three textured lines (dark, striped and clear) 
distributed across its three entries. 

- Rule C. The orientation of the lines is constant within a row, but varies between 
rows (vertical, horizontal, then oblique). 
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Figure 2. An idealized scaling solution that summarizes the relations among ability 
tests across several sets of data, illustrating the centrality of the Raven test (from Snow, 
Kyllonen & Marshalek, 1984; Figure 2.9, p. 92). The outwardly radiating concentric circles 
indicate decreasing levels of test complexity. Tests involving different content (figural, 
verbal, and numerical) are separated by dashed radial lines. (Reprinted by permission of 
authors and publisher). 
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Figure 3. A problem to illustrate the format of the Raven items. The variation 
among the three geometric forms (diamond, square, triangle) and three textures of the line 
(dark, striped, clear) is each governed by a distribution-of*three*values rule. The orientation 
of the line is governed by a constant in a row rule.. (The correct answer is #5). 
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The missing entry can be generated from these rules. Rule A specifies that the 
answer should contain a square (since the first two columns of the third row contain a 
triangle and diamond). Rule B specifies it should contain a dark line. Rule C specifies that 
the line orientation should be oblique, from upper left to lower r'^ht. These rules converge 
on the correct response alternative, #5. Some of the incorrect response alternatives are 
designed to satisfy an Incomplete set of rules. For example, if a subject induced Rule A 
but not B or C he might choose alternative #2 or #8. Similarly, inducing Rule B but 
omitting A and C leads to alternative #3. This sample problem illustrates the general 
structure of the test problems, but corresponds to one of the easiest problems in the test. 
The more difficult problems entail more rules or more difficult rules, and more figural 
elements per entry. 

The research is reported in Carpenter, Just & Shell (1990), which describes a 
theoretical model of the processes in solving the Raven test, contrasting the performance of 
college students who are less successful in solving ths problems to those who are more 
successful. The model is based on multiple dependent measures, including verbal reports, 
eye fixations and patterns of errors on different types of problems. The experimental 
investigations led to the development of computer simulation models that test the sufficiency 
of our analysis. Two computer simulations, FAIRAVEN and BETTERAVEN, express the 
differences between good and extremely good perforrnance on the test. FAIRAVEN performs 
nice the median- college student in our sample; BETTERAVEN performs like one of the very 
best. Figure 4 shows a flow-chart of the processes in BETTERAVEN. 

The simulation had several modules (figure 4) that encode the stimuli (symbolic 
descriptions of the figures), match the encoding to rules, generalize rules, and find the 
response. But the important part of the simulation that accounted for the difference 
between the median and best subjects was a goal manager. The goal manager kept track 
of multiple rules and allowed the system to backtrack in reformulating alternative rules. 
BETTERAVEN differs from FAIRAVEN in two major ways. BETTERAVEN has the ability to 
Induce more abstract relations than FAIRAVEN. In addition, BETTERAVEN has the ability to 
manage a larger set of goals in working memory and hence can solve more complex 
problems. In a cognitive "lesioning" experiment, we changed the architecture of simulation 
to individual differences. We manipulated the capacity of the goal manager. This 
manipulation allowed the simulation to capture the differences between median and very best 
performing subjects. 



Insert Figure 4 - BETTERAVEN 

The contrast between the models specifies the nature of the analytic intelligence 
required to perform the test and the nature of individual differences in this type of 
intelligence. The processing characteristic that is common to all subjects is an incremental, 
re-iterative strategy for encoding and inducing the regularities in each problem. Thus, the 
paper argues that the processes that distinguish among individuals are primarily the ability to 
induce abstract relations and the ability to dynamically manage a large set of problem- 
solving goals in working memory. 

Our current conception of working memory capacity is in terms of the amount of 
activation available for both maintaining and manipulating symbolic information in reasoning 
tasks. We have developed an interpreter for a production system architecture that can be 
set to have different amounts of activation (high amounts correspond to good ability). We 
can also use this simulation to investigate different strategies for what occurs to information 
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Figure 4 



Figure 4. A block diagram of BETTERAVEN. The distinction from FAIRAVEN 
visible from the block diagram is the inclusion of a goal monitor that generates and keeps 
track of progress in a goal tree. 
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(forgetting plus slowing down of processing) when there is too little activation. This 
architecture has been applied to account for several types of systematic individual 
differences in language comprehension tasks; the architecture and the empirical results are 
described In detail In several recent publications (Carpenter & Just, 1989; Just & Carpenter, 
1992; MacDonald, Just & Carpenter, 1992) 

One conclusion of the research on individual differences in reasoning tasks is that a 
key determinant of performance in complex reasoning tasks Is the availability of adequate 
working memory resources both for computing and storing intermediate goals and products 
during problem solving. In particular, the executive processes that enabled problem solvers 
to generate subgoals in working memory, to record the attainment of subgoals, and to set 
new subgoals as others were attained were critical to problem solving success and a source 
of individual differences. The executive processes were examined in studies of both 
cognitive processes and individual differences as determined by the Raven Progressive 
Matrices test; the latter is a measure of fluid reasoning ability and it typically correlates 
highly with complex visual problem solving. 

Summary. This research suggests a very clear hypothesis about the nature of 
individual differences and task variation, more generally, in analytic problem solving. 
Ongoing research seeks to re-examine conceptions of spatial problem solving skill in light of 
this theoretical model of the constraints on analytic problem soling. 



II. Mental animation and computer animation 

As background, it is useful to remember that Navy training and maintenance manuals 
include diagrams with accompanying texts that are very complex for individuals who are less 
mechanically knowledgeable. The complexity of such material is illustrated in a typical 
excerpt taken from the Navy's book "Basic Machines and How They Work." 



Insert Figure 5 - Navy Manual Excerpt 

Our research has examined the processes used In Interpreting such diagrams (and 
texts) and ways to use computer technology to impact on the comprehension of such 
materials. 

Individual differences in these tasks were assessed by a common test of mechanical 
knowledge called the Bennett Mechanical Comprehension Test (1969), which has some items 
that are similar to those in the ASVAP. Typically, the item shows a mechanical situation 
and asks about some physical property (such as mechanical advantage) that does not 
require complex calculation. This isomorph of an actual item asks about the relative 
mechanical advantage of two systems. What is important is that it implicitly pits a relevant 
feature (the weights of the two objects) against an irrelevant feature (their distances from the 
source of the force - the man). Less mechanically-experienced subjects and those who 
haven't had formal physics Instructions are more likely to be misled by the distance factor. 
Their implicit model of the problem is that force flows from the source (the man) to the 
goal and so the first weight (answer B) will be lifted first. By contrast, the correct analysis 
is that the tension is equal throughout the rope and so the lighter weight (answer A) will be 
lifted before the heavier weight. Hence, the correct answer is A. 
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KydrauUes Aid tht Htlmsm&n 

You'vt probably sttn tht htlms man swine & 
ship wtUhinf thousands of tons about as tasily 
IS you turn your car# Kq> ht's not a superman* 
Ht does it with machines* 

Many of these machines art hydraulic. There 
a^rt; several types of hydraulic and electro* 
hydraulic steering mechanisms, but the simpli- 
fied diafram in ficun*e 10*11 will help you to 
understand the feneralprlnciples of their opera* 
tion. As the hand steering wheel is turned in a 
count ercloclcwise direction, its motion turns 
the pinion fear f . This causes the left*hand rack 
Ti to move downward, and the rt|;ht*hand rack 
T2 to move upward. Notice that each rack is 
attached to a piston Pj or P2* downward 
motion of rack r ^ moves piston p^ downward in 
its cylinder and pushes the oil out of that cylinder 
through the line* At the same timSa piston P2 
moves upward and pulls oil from the rl;ht*hand 
line into the risht*hand cylinder. 



U you follow these two lines, you see th&t 
they enter % hydraulic cylinder S-»one line 
enterlnf above and one below the sinfle piston 
In that cylinder* In the direction ot the oil flow 
in the diafram, this piston and the attached 
plunder art pushed down toward the hydrate 
pump h. So far, in this operation, you have used 
hand power to develop enoulih oil pressure to 
move the control plunger attached to the by* 
draulic pump. At this point an tlectric motor 
takes over and drives the pump h* 

CHI is pumped under pressure to the two bit 
stcerinc rams Ri and R2* 
pistons in these rams are connected directly to 
the rudder crosshead which controls the position 
of the rudder. With the pump operating in the 
direction shown, the ship's rudder Is thrown to 
the left, and the bow will swing to port This 
operation demonstrates how a small force sp* 
plied on the steering wheel sets in motion a ser* 
ies of operations which result in s forct of 
thousands of pounds* 
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Figure 5. An example of text and a 
entitled Basic Machines and How Th^^ Wor^^ 



mechanical diagram from the Navy manual 
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Insert Figure 6 - Mechanical Knowledge Test Item 



It is reasonable to claim that people who understand mechanical systems can Infer the 
principles of operation of an unfamiliar device from their knowledge of the device^s 
components and their mechanical interactions. Individuals vary considerably in their ability to 
make this type of inference. A research project, reported in Hegarty, Just & Morrison, 
(1988), describes studies of performance of college students in psychometric tests of 
mechanical ability. Based on subjects' retrospective protocols and response patterns, it was 
possible to identify rules of mechanical reasoning that accounted for the performance of 
subjects of different levels of mechanical ability. The rules are explicitly stated in a 
simulation model which demonstrates the sufficiency of the rules by producing the kinds of 
responses obsen/ed In the subjects. Three abilities are proposed as the sources of 
individual differences in performance: 

(1) ability to correctly identify which attributes of a system are relevant to its 
mechanical function^ 

(2) knowledge of a general functional relation between the attribute and the outcome 
(in this case, mechanical advantage) and the ability to use rules or relation consistently, 

and (3) ability to combine information about two or more relevant attributes, Initially 
qualitatively and then, quantitatively. 

A series of protocol studies using carefully constructed Items revealed that mechanical 
knowledge contributes to problem solving In the domain of mechanics in two ways: by 
increasing the likelihood of Identifying the relevant attributes of a system, and by providing 
qualitative and quantitative rules that related these attributes to mechanical advantage. 
Without the relevant mechanical knowledge, such devices were internally represented in a 
fragmentary and non-functional way. 

Mechanical reasoning by students and professional mechanics. In the next 
section, we describe several studies of mechanical reasoning in students and professional 
mechanics. This research was actually preliminary to the simulation and experimental 
studies reported above. Their importance here is to support the claim that the reasoning 
processes reported above are fairly general, both across different populations and different 
types of reasoning tasks. 

Both book learning and hands-on experience under the car hood may improve 
mechanical reasoning. The studies, because they are all correlational, are only suggestive; 
nevertheless, we examined the impact of either practical mechanical experience and formal 
training in physics principles (operationalized as 1 year or more of college physics) on 
performance in the Bennett. ''Mechanical experience" was operationalized as the person's 
report of some specific categories of practical mechanical experience, such as fixing small 
appliances, such as a toaster or a lamp; assembling a mechanical object, such as a bicycle 
or wheel barrow; or participating in activities, such as car repair. Either no mechanical 
experience or very sporadic and superficial experience was considered as "No Reported 
Experience." The same classification was used in a second study In which subject were 
asked to "talk aloud" while solving the problems to allow us to analyze their processes. 
The test scores were similar for the two tasks, suggesting that talking aloud did not impact 
on the overall problem solving success. In addition, to examine the contribution of spatial 
training to mechanical problem solving, we recruited 14 architecture majors; these students 
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Figure 6. Example of an item that is similar to those in the Bennett Mechanical 
Comprehension Test. An irrelevant dimension (distance from the person) is pitted against a 
relevant dimension (weight of the items to be lifted). The correct answer is ''A''. 
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were essentially without formal physics training. The data indicate contributions of both 
mechanical experience and school courses for each group. The performance of the 
architecture students suggests that In spite of their lack of formal training relevant 
experience led to similar levels of performance. 



Score (on 68 item Bennett Test) 

Mechanical Experience No Mechanical Experience 

Unselected College Students — Standa/d Test 

1 year physics [n=ll,13] 51,3 (s.d.=ll) 42.8 (s.d.«12) 

No physics [n=4,21] 46,7 (s.d.=:12) 33.0 (s.d.=8) 



Unselected College Students — "Thinking Aloud" Test 

1 year physics [n=7,7] 53.5 (s.d.=5) 40.1 (s.d.=16) 

No physics [n=6,9J 37.5 (s.d.=15) 30.3 (s.d.:=12) 

Architecture Students 

No physics ln=7,7] 51.1 (s.d.=8) 44.7 (s.d.=8) 



This correlational analysis must be interpreted cautiously because of the obvious lack 
of control over the characteristics of who might end up in these various self-reported ceils. 
Nevertheless, our data suggests that both mechanical experience and formal training are 
associated with higher scores. An additional point is that little of the formal instruction in 
college physics directly addresses the mechanical, electrical, and kinematic situations that 
are probed in the more practically-oriented items in the Bennett Test (1969). Consequently, 
the transfer that occurs from the course work may be at a more abstract level, such as 
learning the general principles. In addition, there is the fact that some difference In the 
scores may reflect more general subject selection characteristics of who takes college 
physics and who tends to have mechanical experience. 

An additional point, which is relevant to the generality of our subsequent studies, is 
that the performance of the college students in most conditions is comparable to that cited 
in the Bennett Manual (1969) from a study of 315 applicants for "technical defense 
courses." 



Mechanical Experience No Reported Experience 

In=220,95] 41.7 (s.d.=8.6) 39.7 (s.d.«8.9) 

I 



These means from the manual are similar to those obtained in a much larger study 
reported of applicants for positions as firemen or policemen In New York City; the scores for 
the 879 high school graduates (removing data from those who had attended college) was 
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36.7 (s.d.«9.7). Thus, the mechanical problem solving skills of most of the unselected 
subjects in our studies (with the exception of those who take college physics and report 
considerable mechanical experience) is roughly similar to that found In less selective 
populations. 

It is also the case, however, that general problem solving skills confer some advantage 
In mechanical problem solving. The Bennett manual reports correlations between the 
Bennett and (an unspecified) intelligence tests of .40-.60. Consistent with the general result, 
in Experiment 2 (involving verbal protocols), the correlation for 29 subjects between Bennett 
and reported verbal SAT was .40. From this positive correlation, one might expect that 
more selective populations, selected by measures related to intelligence test scores, will tend 
to have higher scores on mechanical problem solving tests. 

The important point here is that the processes In solving mechanical problems revealed 
in these students may generalize to other populations. 

The naturally curious reader might wonder about the levels of performance by the 
professional mechanic - the person to whom one entrusts one's Ford on the bad day that 
it stalls on Main Street. Are professional mechanics immune to the errors that plague mere 
mortals? In fact, some window on the extremes of experience was provided by a group of 
professional mechanics who solved the Bennett while talking aloud about their hypotheses 
and ideas. These were 13 adults who made their living as mechanics, including 3 airplane 
mechanics, 1 auto mechanic and 9 professional bicycle mechanics (two of whom had been 
mechanics in the Armed Forces). Their professional experience ranged from a minimum of 
1 year to, at the other extreme, 28 and 41 years of experience (for two of the airplane 
mechanics). But "older" did not prove to necessarily be wiser; for these subjects, the 
correlation between years of professional mechanical experience and Bennett score was r := 
.08. Anecdotally, the actual mechanical experience differed among these individuals In spite 
of the shared job title. For example, the one auto mechanic said that most of his job was 
simply replacing parts that he was told to replace; he said that he seldom mechanically 
repaired broken parts. Perhaps not surprisingly, some mechanical jobs may not yield nuts- 
and-bolts experience that the layman naively associates with the position. 

The overall scores of the group was 52, with an average of 15.9 errors out of the 68 
problems. Interestingly, these professionals tended to make errors on the same problems 
that caused difficulty for the amateurs; the correlation over the 68 problems between the 
error rates for the two groups was .80. The reasons that mechanics gave for their answers 
were generally similar to those given by the other high scoring group - college students 
who had at least 1 year of a college physics course. One major difference between the 
groups, summarized below, was that professional mechanics were more likely to not give a 
reason or simply restate the problem. It may be that professionals had implicit rules, but 
were less likely to have learned the explicit rule that college students could state In giving 
their rationale for an answer. 
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Experienced 



Professional 
Mechanics 



1-yr Physics 
Students 



No reason beyond problem statement 18% 

Mentions correct dimension and functional relation 43% 

Attend to an irrelevant dimension 17% 

Gives an incorrect rule or ignores the relevant dimension 23% 



5% 
70% 
16% 

9% 



In sum, successfuiiy solving these mechanics problems involves learning the relevant 
dimensions, not being attracted by fortuitous variation in an irrelevant dimension; it also 
Involves learning the general type of functional relation that links the relevant dimension to 
the issue (such as mechanical advantage). With formal training, students also learn precise 
quantitative rules and they may be more likely to learn the terminology to describe the 
relevant principles, even though quantitative rules are not required to solve the qualitative 
Bennett-type problems. 

Comprehension of mechanical diagrams. The processes in successfully 
understanding a novel device or situation may seem complex, as witnessed by the difficulty 
that othenwise reasonable adults experience when confronted with the task of assembling a 
child's bike. Or, in the context of the Navy, consider the difficulty of understanding the 
explanation (given earlier) that we excerpted from the Navy's manual on how "hydraulics 
aids the helmsman". The nature and complexity of the processing in comprehending 
mechanical systems were apparent in a series of studies on how* people reason about novel 
mechanical devices. One purpose of these studies was to understand the reasoning 
processes and sources of error; a related goal was to understand the role of mental 
animation and the depiction of animation in a graphics display. The question was whether 
a good graphics display could circumvent some of the difficulties that viewers have in 
understanding how mechanical things work. 

In a typical experiment, the subject was shown a diagram and brief text that described 
a simple, novel device. The device was simple in the sense that it was created from a 
small number of common mechanical components, such as levers, gears, and ratchets. 
Although the device was similar, the task was not; many subjects had great difficulty figuring 
out what the device was doing. The difficulties experienced by these college students may 
be reasonably representative of the difficulties experienced by other, less selective adult 
populations. 

The task that the subject faced can be understood by considering a typical device, 
called the ratchet device, shown in Figure 7. The task is to determine the motion of the 
wheel when the handle Is pumped. IThe answer is that the gear turns clockwise.) Figure 
8 shows another example, called the pencil device. One can read the text, look at the 
diagram, and try to solve the problem given to the subject: The reader's task is to 
determine how the pencil moves when the drive gear moves clockwise? [Alternatively, the 
less mentally energetic might simply accept the answer that the pencil will trace a figure-8 
that is oriented sideways.) 
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Insert Figure 7 and 8 - Mechanical Devices 

To understand the intellectual sleuthing we undertook to pinpoint the difficulties in 
understanding how these things work, It is useful to describe the scene of the crime, so to 
speak; In this case, the scene was a study in which we recorded the eye fixations of 
subjects reading the text and Inspecting the diagram. A task analysis of how subjects 
understand such a device suggested that comprehension involves determining how connected 
components interact and that this Inference Is done by "mentally animating" various joints 
along a line of action connecting the Input force to the output. If so, then "mental 
animation" may be an important aspect of comprehension; more importantly, relieving the 
burden of mental animation by providing an animated display might Improve the 
comprehenslbllity of such devices. Therefore, the research contrasts condition in which the 
display was static (as a diagram in a book) with one in which either the entire device or 
some component could be animated (usually at the viewer's discretion). The following 
sections describe three studies, one involving eye fixations, another with verbal protocols, 
and a third using a technology in which subjects explored the text and diagram by using a 
mouse to determine what components or sentences were visible. Throughout these studies, 
we found that for these devices, subjects who had more mechanical knowledge were not 
typically helped by the animation. It is as though they had sufficient schemas to infer the 
motions of the components and Interactions for these devices. More surprisingly, the lower 
knowledge Individuals were not helped very much either. The ability to animate the display 
decreased some of the their mistakes In mentally animating a joint; on the other hand, the 
difficulty of combining successive animations to determine interrelations among non-adjacent 
components appeared to be still problematic. "Seeing" the animated device Is not a 
transparent perceptual process, but rather a complex cognitive perceptual process. 

Experiment 1: Eye Fixations. In the first project, we analyzed how subjects 
inspected the diagram by recording their eye fixations. Forty undergraduates studied the 
ratchet device (after some preliminary familiarization with the procedure, display, and 
equipment). They were given as much time as they required. Then they were given 2- 
alternative and 4-alternative multiple choice questions about the functioning of the system, 
such as (1) What statement best describes the motion of the gear as the handle is 
pumped; (2) What happens to the small vertical connecting lever when the handle Is pulled?; 
(3) What happens to the upper bar when the handle is pulled? Finally, they were asked to 
draw a picture of the device. 

The subjects* mechanical knowledge was assessed by using a modified version of the 
Bennett in which we eliminated 20 questions that were least informative. The remaining 48 
questions were those that had the best item-response characteristics (namely, an ogive 
function when the proportion correct for that item Is plotted as a function of total score on 
the test) using data from the earlier studies of the Bennett Test. The subjects were divided 
into higher and lower scoring groups, with an average of 82% correct for the higher scoring 
groups and 61% for the lower scoring on the shortened Bennett. 

The results. In answer to one of the major questions that motivated this study, 
animation had, at best, small and localized effects on subjects understanding of how the 
device worked. On the 9-question test asking about the device and its components, lower- 
knowledge subjects answered 2.7 and 3.6 questions correctly, and higher knowledge subjects 
answered 5.3 and 4.7 for the static and animated conditions, respectively; so that only 
knowledge and not animation had significant effects, F(1,36) « 13.48, p<.01. 
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This machine makes a gear wheel turn when 
the handle is pumped. The machine consists 
of a handle linked by a system of levers 
and bars to the gear wheel. When the handle 
is pulled, the upper bar turns the gear while 
the tooth in the lower bar slides over the 
gear teeth. When the handle is pushed, the 
lower bar turns the gear while the tooth on 
the upper bar slides over the gear teeth. 

Figure 7 
2i 
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This machine moves a pencil when the leftmost gear, 
called the drive gear, is turned. The machine 
consists of an upper bar, a lower bar, a large 
upper gear, a smaller lower gear, and the drive gear, 
as labelled in the diagram. The upper and lower gears 
have pins mounted perpendicular to their surfaces 
and near their edges through which the gears interact 
with the bars. The pencil is perpendicular to the paper 
and mounted through both bars. 
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Your task is to figure out the shape of the line that 
would be drawn by the pencil when the drive gear is^ 
turned clockwise. 



Figure 8 
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The most striking evidence of the fragmentary representation of lower-knowledge 
subjects was their subsequent drawings. We analyzed the drawings using a condition-blind 
scoring of the presence/absence of the major functional components; 70% of the lower- 
knowledge subjects' drawings had major errors, compared to only 45% for the higher- 
knowledge subjects. Moreover, the animated display did not ameliorate this difficulty; the 
likelihood of major structural errors was almost identical for the static and dynamic displays. 
Examples of the drawings (in Figure 9) most graphically conveys their confusions and 
mistakes. As these samples indicate, many subjects, particularly low knowledge subjects, 
had fundamental misconceptions about the major functional components and their 
interrelation. 



Insert Figure 9 - Drawings of Ratchet Device 

Lower knowledge subjects are more driven by the text In learning about the device, as 
indicated by relatively longer time (44 sec) they spent reading the text and smaller time (35 
sec) inspecting the diagram than the higher knowledge subjects (34 sec and 40 sec, 
respectively), F(1,36) = 8.53, p<.01. Six seconds, on average, was spent in actually 
animating the display; this was additional time on the diagram, there was no influence of 
animation on the time spent reading the text. In spite of the reading and detailed 
inspection of the diagram lower-knowledge subjects had only fragmentary knowledge about 
the device. 

Experiment 2: Verbal Protocols and Supplemented Descriptions. If lower 
knowledge subjects are so dependent on the text for guidance, perhaps a text that provided 
a great deal of guidance could break the bottleneck to improve their understanding. To test 
this hypothesis, we compared the standard description to a another version that was 
supplemented by instructions to imagine the motion of components In a sequence that 
corresponded with the line of action from input to output. In addition, we asked subjects to 
"think aloud" while they read the description and inspected the diagram. Forty students 
participated in the study, half of whom were given the supplemented description. 

Disappointingly, the supplemented description was unable to break the bottleneck in 
comprehension. Few subjects (8 in the regular description condition and only 5 in the 
supplemented description condition) accurately described the motion of the gear wheel for 
the ratchet device. And overall, their question-answering skill was at a level similar to that 
in the eye fixation study. Some suggestion of the source of the difficulty came from the 
verbal protocols of subjects who failed to determine the motion of the gear. They were less 
likely to follow a lines of action; in addition, they were more likely to make an error in their 
inference about the direction of motion of a component. In sum, supplementary text did not 
improve comprehension, but the protocols strongly supported our task analysis that 
comprehension involved mentally animating the interacting components along a line of action. 
An inability to do such animation or follow a line of action was correlated with mistakes In 
understanding the device. 

Experiment 3: Moving with a Mouse. The next hypothesis to be evaluated followed 
from the observation that better subjects mentally animate each joint as they follow a line of 
action; therefore, perhaps comprehension would improve if viewers were guided along lines 
of action and also were able to animate the display of a joint. Before describing the 
interesting technology that let us do this, it is useful to give the bottom line: Even this 
combination of animation and guidance did not dramatically improve the understanding of the 
lower-knowledge subjects. Subjects made fewer errors on the motions of individual 
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Higher Knowledge 
Subject 



Typical drawing by 
Lower Knowledge 
Subject 



Typical drawing by 
Lower Knowledge 
Subject 
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components, but they weren't helped on inferring interrelations or putting together successive 
components. To further jump ahead, it is possible to even speculate on the reasons why 
this Intervention was unsuccessful. At this point, a plausible analogy might be made to 
educational research that attempted to develop reading skill in poorer skilled children (and 
adults) by trying to make them fixate at the same rate or in a simitar pattern to good 
readers. The problem with this reading intervention is that It was aimed at an effect, not a 
cause. Good readers were faster as a result of better lexical, syntactic and semantic 
representations and processing, as well as more capacity to retain the Intermediate and final 
products of their comprehension. The suggested analogy is that higher-knowledge individuals 
show the consistencies in tracing lines of action because they are more adept at accessing 
and assembling from their knowledge base appropriate representations that guide the 
encoding of relevant components, as well as their inferences about action. 

The technology. The software was developed to be analogous to the "Moving window" 
technology used in reading. The idea is to limit what parts of the display are visually 
available and allow the subject to determine when and where to move to the next part. 
Thus, the experimenter can measure the sequence and duration for each portion of the text 
and diagram as they are viewed. Subjects selected which portion they saw by moving a 
mouse pointer Into the region of the display screen associated with the portion. The 
amount of text visible in one portion was one paragraph. Hence, if a subject moved the 
mouse pointer onto some obscured text, all the words in that paragraph would become 
visible. Text was obscured by replacing every letter with an "x". For the diagram, either 
two or three continuous components were visible in a portion. (In the ratchet diagram. In 
addition to the two contiguous components that were displayed, the handle was also always 
visible to indicate whether it was in the push or pull phase of the cycle.) Device 
components were obscured by removing all detail, such as gear teeth, pivots and linkages, 
and replacing them with dimly illuminated blocks of grey. Consequently, the viewer always 
had some visual display of a component in their periphery, but no detailed information. 

To ensure that subjects looked at components !n the order specified in these texts, 
the control program would only permit subjects to select views in the same order as 
specified in the text. This program permlued subjects to select as many or as few 
components in a line of action as they chose, but the first component selected had to be 
the handle, and successive components had to follow the line of action. 

To determine the effect of providing subjects with multiple views of the diagrams, an 
additional animation condition was run in which the entire display was visible. When the 
display was animated, all of the components of the. device moved and were visible to the 
subject to inspect freely (as in the animation condition of the eye fixation study. 

Subjects were also familiarized with real physical models before the experiment in 
order to ensure that difficulties didn1 arise from a lack of understanding of various symbols. 
The models demonstrated the difference between pivots and linkages, and Introduced the 
graphic symbols that were used in the computer displays to represent pivots and linkages. 

One hundred and one undergraduates from Carnegie Mellon served as subjects in the 
experiment. 

Results. The most interesting results arose from an analysis of differences among 
questions. Specifically, animation improved the ability of lower knowledge subjects to answer 
questions about the motion of a component or component at a joint that were explicitly 
mentioned by the text. The improvement therefore, was very local. With the supplemented 
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description and animation, subjects averaged 4.0 correct answers, significantly better than 
with the 2,3 correct answers with the regular description.. t(18) ^ 2.85, p <.01. However, 
in this condition, the display also controlled the order in which components could be 
inspected. The effect of animation is also consistent with the claim that lower knowledge 
subjects are text driven; if the text directs them to evaluate the motion of a specified 
component, they can use the display animation to "read off" the motion. This also explains 
why there may be no general effect of animation on lower knowledge subjects. Animation 
does not provide the more general abstract schema that they may need to construct a 
better mental model of the device. In contrast, the high ability subjects are able to make 
some inferences about the motions of components, whether or not the text directs them to 
do so. Higher knowledge subjects generally performed better on questions that depended 
making inferences from the diagram, irrespective of whether the text mentioned those 
specific components. Given that the high ability subjects can make some inferences from 
the diagram without being directed by the text, it follows that the animation will not be so 
useful to these subjects. 

The drawings were scored according to the presence of the major functional 
components on a scale from 0-7, where 7 points were given to a drawing in which all of 
the functionally significant structures were present and correctly positioned. No points were 
given or taken away for quantitative features (such as the number of gear teeth, the size of 
components) that did not impact on the general functioning of the device. The kinds of 
drawings were similar to those In Figure 9 and Figure 10 shows the examples of the pencil 
device. 



Insert Figure 10 - Drawings of Pencil Device 



Lover Knowledge Subjects 
Comprehension Errors and (Rating of Drawing) 



Display Type 
Static Animated Entire Animation 

Supplemented 4.3 (2.0) 3.0 (1.5) 

Normal 3.7 (1.7) 4.7 (0.7) 3.9 (1.7) 

Average 4.0 (1.8) 3.8 (1.1) 3.9 (1.7) 



In the supplemented-animated condition, the subject did not see the entire display 
animated, but only joints. Consequently, some of their errors might be attributed to the 
necessity of integrating pieces of information. However, this hypothesis is not supported, 
because low mechanical subjects who could animate the entire display had marginally higher 
error rates, 3.9 errors compared to 3.0 errors in the supplemented-animated condition. In 
the entirely animated condition, all ten subjects animated the display in the pull cycle and 
eight also animated it in the push cycle. Thus, all of the information about the motion of 
various components was available to most of the subjects. Its availability makes it surprising 
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that 40% of these subjects made errors answering the targeted question, namely, what 
direction does the gearwheel move when the handle is pushed and pulled. The fact that 
they didn't understand the way motion was transmitted to the gearwheel or the gearwheel's 
motion itself, even when they could view the entire device, highlights the role encoding plays 
In understanding the animated display. 

The question-answering performance for the static display conditions replicated the 
results of Experiment 1, showing no improvement due to the supplementedness. Moreover, 
the results were generally consistent with the hypothesis that low ability subjects have 
difficulty with mental animation. Subjects In the supplemented condition made a few more 
errors (4.3 errors) than those in the normal condition (3.7 errors), but the difference was not 
significant, l(18)<1. Hence, supplementedness in the text alone, without the capability to 
animate the display, does not help low ability subjects. 

The low ability subjects also made consistent, major errors in their drawings the 
device's structure, suggesting that the low mechanical subjects did not encode or appreciate 
the relevant geometric structure. The Table above shows the average rating for the low 
ability subjects' drawings on the scale that ranged from 0 to 7 points. Most drawings. In 
fact 31 of the 50, had major structural errors in the location, number, and nature of the 
components (exclusive of the gear teeth) and 38 had. major errors in drawing the gear teeth 
(either no teeth, symmetrical teeth, or teeth that were bacl^wards). 

The drawings and question answering were not highly correlated for the low 
mechanical subjects, ^48) ^ .24, in contrast to the high correlation we will report for the 
high mechanical ability subjects. The dissociation between the drawing and question 
answering for the low mechanical subjects suggests that the animated display helped them 
encode information about the component's movement, but did not improve their 
understanding of how the motion was determined by the geometric structure of the device. 

In contrast to the low mechanical subjects, many high mechanical subjects did 
understand the structure and motion of the ratchet device, as reflected in significantly better 
question answering and in their drawings. In fact, better comprehension scores correlated 
with higher ratings of the drawing across the 51 high mechanical subjects, r(49) = -0.65, p 
<.01. An obvious interpretation of this correlation is that an accurate encoding of the 
structure permitted subjects to mal^e the correct kinematic inferences. 



Higher Knowledge Subjects 
Comprehension Errors and (Rating of Drawing) 



Type of Diagram 



Static 



Animated 



Entire 
Display 



Supplemented 



2.3 (5.1) 



2.1 (4.5) 



Normal 



2.5 (5.3) 



1.8 (3.9) 



2.4 (4.0) 



Average 



2.4 (5.2) 



2.0 (4.2) 



2.4 (4.0) 
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The average drawing of the high ability subjects Included many, but not all, of the 
major structural components in their proper configuration. Out of a maximum of 7 points, 
the average rating was 4.5 points, a rating that would typically reflect a drawings that was 
missing the pivots for the lever and handle, but had a correct representation of the major 
components, their configuration, and the asymmetry of the gear teeth. 

This correlation between the drawing and comprehension score was slightly larger in 
the static conditions, where subjects had to mentally animate the device, compared to the 
animated display conditions. One might expect that a correct encoding of the structure 
would be more crucial to inferring the correct motion- in the static conditions. The overall 
correlation between the question answering and the drawings highlights the important role of 
selective encoding, both when the display is static and when it is animated. In a cognitive 
analysis of the components of mechanical ability, we found that one component is knowing 
what components of a device are mechanically relevant (Hegarty, Just & Morrison, 1988). In 
this particular task, such knowledge helps one know what is to be coded. For example. It is 
crucial to the functioning of the ratchet that the teeth be asymmetrical. However, some 
subjects did not depict them as asymmetrical and the likely interpretation is that they did 
not code the asymmetry as particularly important. Also, it is crucial to the ratchet device 
that the lever pivot around a point; but some subjects did not indicate such pivots in their 
drawing. In general, high mechanical subjects who didn't indicate the functionally important 
aspects in their drawings also weren't able to answer questions about the motion of various 
components. 

Animated Display Condition 

Time on Time on 



Description 


n 


Errors 


Diagram (sec) 


Text 


Supplemented 


7 


1.1 


166 


75 


Normal 


7 


1.0 


122 


50 


Supplemented 


3 


4.3 


104 


71 


Normal 


3 


3.7 


123 


35 



Using animation. With both supplemented and normal descriptions, most high 
knowledge subjects made multiple scans of the upper and lower path and, correspondingly, 
their error rates were low. For 14 of the 20 subjects, they made an average of 2.2 
complete traces of the lower path (which has more components, so that it is easier to 
identify a trace). The high ability subjects in the normal condition animated fewer times 
than those in the supplemented condition, but subjects in both conditions usually animated a 
kinematic pair in the context of scanning along a kinematic chain. The supplemented 
condition provided structure that the high ability subjects used, but in some sense, may not 
have needed because they had the strategy of generally following kinematic chains. 

Summary. Animation graphics provides a potentially powerful tool for aiding the 
comprehension of diagrammatic material. What the current research suggests, however, is 
that animation is not the entire solution. In particular, lower knowledge individuals still need 
guidance from the text. Moreover, even relatively simple devices appear quite complex to 
these less knowledgeable individuals who have no schemas to identify the relevant 
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dimensions and separate them from the irrelevant, but visually complex features. Animation 
graphics does not necessarily improve their overall comprehension, in spite of clearly 
eliminating some of the sources of error. In our ongoing research, we are now trying to 
find out how less knowledgeable individuals or subjects with less spatial ability perceive 
animated displays. 
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